STAM101 :: Lecture 11 :: Attributes

Contingency table – 2x2 contingency table – Test for independence of attributes – test for goodness of fit of mendalian ratio

Test based on  -distribution

In case of attributes we can not employ the parametric tests such as F and t. Instead we have to apply test. When we want to test whether a set of observed values are in agreement with those expected on the basis of some theories or hypothesis. The statistic provides a measure of agreement between such observed and expected frequencies.

Chi-Square

The  test has a number of applications. It is used to

  • Test the independence of attributes
  • Test the goodness of fit
  • Test the homogeneity of variances
  • Test the homogeneity of correlation coefficients
  • Test the equaslity of several proportions.

In genetics it is applied to detect linkage.

Applications

– test for goodness of fit

A very powerful test for testing the significance of the discrepancy between theory and experiment was given by Prof. Karl Pearson in 1900 and is known as “chi-square test of goodness of fit “.

If 0i, (i=1,2,…..,n) is a set of observed (experimental frequencies) and Ei (i=1,2,…..,n) is the corresponding set of expected (theoretical or hypothetical) frequencies, then,
 
It follows a  distribution with n-1 d.f.  In case of only one tailed test is used.

 

 

Example

            In plant genetics, our interest may be to test whether the observed segregation ratios deviate significantly from the mendelian ratios. In such situations we want to test the agreement between the observed and theoretical frequency, such test is called as test of goodness of fit.

Conditions for the validity of -test:
-test is an approximate test for large values of ‘n’ for the validity of -test of goodness of fit between theory and experiment, the following conditions must be satisfied.

  • The sample observations should be independent.

 

2. Constraints on the cell freqrequency, if any, should be linear.    
Example:=.

3. N, the total frequency should be reasonably large, say greater then (>) 50.

4. No theoretical cell frequency should be less than (<)5. If any theoretical cell frequency is <5, then for the application of - test, it is pooled with the  preceding or scecceeding frequency so that the pooled frequency is more  than 5 and finally adjust for degree’s of freedom lost in pooling.

Example1
The number of yiest cells counted in a haemocytometer is compared to the theoretical value is given below. Does the experimental result support the theory?


No. of Yeast cells in the square

Obseved Frequency

Expected Frequency

0

103

106

1

143

141

2

98

93

3

42

41

4

8

14

5

6

5

 

Solution
H0: the experimental results support the theory
H1: the esperimental results does not support the theory.
Level of significance=5%
Test Statistic:
 


Oi

Ei

Oi­-Ei

(Oi­-Ei)2

(Oi­-Ei)2/Ei

103

106

-3

9

0.0849

143

141

2

4

0.0284

98

93

5

25

0.2688

42

41

1

1

0.0244

8

14

-6

36

2.5714

6

5

1

1

0.2000

400

400

 

 

3.1779

\=3.1779

Table value
(6-1=5 at 5 % l.os)= 11.070
Inference
<tab
We accept the null hypothesis.
(i.e) there is a good correspondence between theory and experiment.

test for independence of attributes

            At times we may consider two charactertistics on attributes simultaneously. Our interest will be to test the association between these two attributes
Example:- An entomologist may be interested to know the effectiveness of different concentrations of the chemical in killing the insects. The concentrations of chemical form one attribute. The state of insects ‘killed & not killed’ forms another attribute. The result of this experiment can be arranged in the form of a contingency table.  In general one attribute may be divided into m classes as A 1,A 2, …….A m  and the other attribute may be divided into  n classes as B 1,B 2, ……B n . Then the contingency table will have m x n cells. It is termed as m x n contingency table


A
B

A1

A2

Aj

Am

Row Total

B1

O11

O12

O1j

 

O1m

r1

B2

O21

O22

O2j

 

O2m

r2

.
.
.

 

 

 

 

 

 

 

Bi

Oij

Oi2

Oij

 

Oim

ri

.
.
.

 

 

 

 

 

 

 

Bn

On1

On2

Onj

 

Onm

rk

 Column Total

c1

c2

cj

cm

n=

where Oij’s are observed frequencies.
The expected frequencies corresponding to Oij is calculated as . The is computed as
    
where
Oij – observed frequencies
Eij – Expected frequencies
n= number of rows
m= number of columns
It can be verified that
This  is distributed as  with (n-1) (m-1) d.f.

2x2 – contingency table

            When the number of rows and numberof columns are equal to 2 it is termed as 2 x 2 contingency table .It will be in the following form

 

B1                   B2

Row Total

A1

A2

a                     b

c                      d

a+b       r1

c+d       r2

Column
Total

a+c                 b+d

c1                     c2

a+b+c+d
=n

Where a, b, c and d are cell frequancies c1 and c2 are column totals, r1 and r2 are row totals and n is the total number of observations.
In case of 2 x 2 contigency table  can be directly found using the short cut formula,
   
The d.f  associated with is (2-1) (2-1) =1

Yates correction for continuity
If anyone of the cell frequency is < 5, we use Yates correction to make as continuous. The yares correction is made by adding 0.5 to the least cell frequency and adjusting the other cell frequencies so that the column and row totals remain same . suppose, the firat cell frequency is to be corrected then the consigency table will be as follows:

 

B1

B2

Row Total

A1

A2

 a

b

a+b=r1

c

d

c+d =r2

Column
Total

a+c=c1

b+d=c2

n = a+b+c+d

Then  use the - statistic as

               
The d.f  associated with is (2-1) (2-1) =1

Exapmle 2
The severity of a disease and blood group were studied in a research projest. The findings sre given in the following table, knowmn as the m xn contingency table. Can this severity of the condition and blood group are associated.
Severity of a disease classified by blood group in 1500 patients.


Condition

Blood Groups

Total

O

A

B

AB

Severe

51

40

10

9

110

Moderate

105

103

25

17

250

Mild

384

527

125

104

1140

Total

540

670

160

130

1500

Solution
H0: The severity of the disease is not associated with blood group.
H1: The severity of the disease is associated with blood group.
Calculation of Expected frequencies


Condition

Blood Groups

Total

O

A

B

AB

Severe

39.6

49.1

11.7

9.5

110

Moderate

90.0

111.7

26.7

21.7

250

Mild

410.4

509.2

121.6

98.8

1140

Total

540

670

160

130

1500

Test statistic:
  
The d.f. associated with the   is (3-1)(4-1) = 6
Calculations

Oi

Ei

Oi­-Ei

(Oi­-Ei)2

(Oi­-Ei)2/Ei

51

39.6

11.4

129.96

3.2818

40

49.1

-9.1

82.81

1.6866

10

11.7

-1.7

2.89

0.2470

9

9.5

-0.5

0.25

0.0263

105

90.0

15

225.00

2.5000

103

111.7

-8.7

75.69

0.6776

25

26.7

-1.7

2.89

0.1082

17

21.7

-4.7

22.09

1.0180

384

410.4

-26.4

696.96

1.6982

527

509.2

17.8

316.84

0.6222

125

121.6

3.4

11.56

0.0951

104

98.8

5.2

27.04

0.2737

                                   Total

12.2347

\=12.2347
Table value of for 6 d.f. at 5% level of significance is 12.59
Inference
<tab
We accept the null hypothesis.
The severity of the disease has no association with blood group.

Example 3
In order to determine the possible effect of a chemical treatment on the rate of germination of cotton seeds a pot culture experiment was conducted. The results are given below
Chemical treatment and germination of cotton seeds

 

Germinated

Not germinated

Total

Chemically Treated

118

22

140

  Untreated

120

40

160

Total

238

62

300

Does the chemical treatrment improve the germination rate of cotton seeds?

Solution
H0:The chemical treatment does not improve the germination rate of cotton seeds.
H1: The chemical treatment improves the germination rate of cotton seeds.
Level of significance = 1%
Test statistic
  
 

 

Table value
(1) d.f. at 1 % L.O.S = 6.635
Inference
 <tab
We accept the null hypothesis.
The chemical treatmentwill not  improve the germination rate of cotton seeds significantly.

Example 4
In an experiment on the effect of a growth regulator on fruit setting in muskmelon the following results were obtained. Test whether the fruit setting in muskmelon and the application of growth regulator are independent at 1% level.

 

Fruit set

Fruit not set

Total

Treated

16

9

25

Control

4

21

25

Total

20

30

50

Solution
H0:Fruit setting in muskmelon does not depend on the application of growth regulator.
H1: Fruit setting in muskmelon depend on the application of growth regulator.
Level of significance = 1%
After Yates correction we have 

 

Fruit set

Fruit not set

Total

Treated

15.5

9.5

25

Control

4.5

20.5

25

Total

20

30

50

 

Tet statistic
    
 

Table value
(1) d.f. at 1 % level of significance is 6.635
Inference
 >tab
We reject the null hypothesis.
Fruit setting in muskmelon is influenced by the  growth regulator. Application of growth regulator will increase fruit setting in musk melon.


Download this lecture as PDF here